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' Abstract 

^ , Representation sharing can reduce the memory footprint of a program by sharing one representation 

^ ■ between duplicate terms. The most common implementation of representation sharing in functional 

programming systems is known as hash-consing. In the context of Prolog, representation sharing 
has been given little attention. Some current techniques that deal with representation sharing are 
reviewed. The new contributions are: (1) an easy implementation of input sharing foi findall/3; (2) 
a description of a sharer module that introduces representation sharing at runtime. Their realization 
is shown in the context of the WAM as implemented by hProlog. Both can be adapted to any WAM- 
' like Prolog implementation. The sharer works independently of the garbage collector, but it can be 

made to cooperate with the garbage collector. Benchmark results show that the sharer has a cost 
comparable to the heap garbage collector, that its effectiveness is highly application dependent, and 
that its policy must be tuned to the collector 
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X 

^ ' 1 Introduction 

Data structures with the same value during the rest of their common life can share the 
same representation. This is exploited in various contexts, e.g., by the intern method for 
Strings in Java, by hash-consing in functional languages (iGoto 1974l l. and by data dedupli- 
cation during backup. In programming language implementation, hash-consing is probably 
the best known representation sharing technique: hash-consing was invented by Ershov in 
dErshov 19581 1 and used by Goto in (I Goto 1974] l in an implementation of Lisp. Originally, 
hash-consing was performed during all term creations so that no duplicate terms occurred 
during the execution of a program. ( Appel and Gongalves 1993) explores the idea of us- 



ing hash-consing only during generational garbage collection: the new generation contains 
non-hash-consed terms, and on promotion to the older generation, they are hash-consed: 
for the first time, a representation sharing technique is cooperating with the garbage col- 



lector. Our approach is most closely related to (Appel and Gonfalves 1993 1, but also has 
some important differences. 

Representation sharing has been given little exphcit attention in the context of Prolog 
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implementation. However, the issue pops up from time to time. Here are some historical 
highlights: 

• in 1989, in his Diplomarbeit, Ulrich Neumerkel (INeumerkel 19891 ) mentioned how 
by applying DFA-minimization to Prolog terms, certain programs can run in linear 
space (instead of quadratic); there was no implementation; in Section lSTl his exam- 
ple program is used as a benchmark 

• 1991: (Sahli nand Carlsson 1991| l ends with the sentence: It still remains to be seen, 
however, what we meant by "folding identical structures "; the current paper offers a 
solution to this mysterious sentence 

• in a 1995 comp.lang. prolog post, Edmund Grimley-Evans ( |Grimley-Evans 1995| l 
asked for more sharing in findall/3, i.e., he wanted the solution list of a call to 
findall/3 to share with the generator; input sharing jMarien and Demoen 19931 1 does 
exactly that; Section [3] describes input sharing more precisely and how it can be 
implemented efficiently 

• in a 2001 Logic Programming Pearl (lO'Keefe 2001l l. R. O'Keefe mentioned a 
findall/3 query that could benefit from representation sharing in the answers; as for 
the previous bullet, Section[3]contains the solution 

2002, (IDemoen 20021 ) gave a fresh view on garbage collection for Prolog; it de- 
tailed a number of desirable (optimal) properties of a garbage collector, one of which 
is the introduction of representation sharing (albeit naming it differently) 

• in May 2009, Ulrich Neumerkel posted an excerpt of his Diplomarbeit in 
comp.lang.prolog and urged implementations to provide for more representation 
sharing, either during unification, or during garbage collection; he used the term/ac- 
toring; we prefer representation sharing; the current paper is the result of exploring 
its implementation issues 

The paper is organized as follows: Section |2] starts with describing what we mean by 
representation sharing. Section |2]lists a number of more or less popular forms of represen- 
tation sharing in Prolog. Section |3]describes how to retain input sharing for findall/3 and 
evaluates our implementation on a number of benchmarks. 

Section]?] sets the scene for the focus of the rest of the paper: general sharing for Pro- 
log. Section ]5] forms the intuition on such sharing, while Section ]6] introduces the notion 
of absorption: it shows when individual cells can share their representation and the ap- 
proximation that works for us. It then lifts representation sharing from individual cells to 
compound terms and discusses some properties of our notion of representation sharing. 
Section]7]explains our implementation of representation sharing based on the earlier deci- 
sions. Section]!] discusses the benchmarks and the experimental results. Section ]9] shows 
extensions of the basic implementation, variations and related issues. SectionfTOldiscusses 
related work, and we conclude in SectionfTTl 

We have used hProlog 3.1.* as the Prolog engine to experiment with, but it is clear that 
everything can be ported to other WAM-like systems as well: we make that more explicit 



later on. hProlog is a descendant of dProlog as described in (Demoen and Nguyen 2000 1. 
SICStus Prolog 4.1.1 serves as a yardstick to show that the hProlog time and space figures 
are close to a reliable state of the art system. All benchmarks were run on an Intel Core2 
Duo Processor T8 100 2.10 GHz. 
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We assume the reader to be familiar with the WAM (l Ait-Kaci 19911 IWarren 1983) and 
Prolog dClocksin and Mellish 1984] l. We use the term heap when others use global stack, 
i.e., the place where compound terms are allocated. We use local stack and environment 
stack interchangeably and denote it by LS in pictures. 

2 Representation Sharing versus Hash-Consing 



Consider the predicates mainl and main2 defined as 



mainl :- 








main2 :- 
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In a naiv^ implementation, the execution of ?- mainl. just before the call to use/2, results 
in a memory situation as in the left of Figure [T] In this figure, the heap cell with Z is a 
self -reference in the WAM. Clearly, the terms X and Y are exactly the same ever after they 




HEAP 



Fig. 1. Representation Sharing 

have been created, and therefore they can share the same representation: that sharing can 
be seen in the right of Figure [T]and in the code for the predicate main2. 

Hash-consing is usually associated with the technique that keeps a hash table of terms 
and during term creation checks whether a term is new or exists akeady in the hash table. 

An implementation with hash-consing usually changes the representation of terms, and 
consequently the code that deals explicitly with this representation. For Prolog the affected 
code would be general unification and built-in predicates. That is too intrusive for our 
aims: we intend our implementation of representation sharing to be easy to integrate in 
other Prolog systems and there should be no global impact. So, we will keep the usual 
(WAM) term representation and do not touch any part of the implementation, except for 
the sharer module that introduces representation sharing. Given the complexity of current 
Prolog systems, this seems to us the only way to make representation sharing accepted by 
implementors. 



I.e., an implementation without compile time common subexpression elimination; however, note the danger of 
such optimization in the presence of destructive assignment: see Section|9] 



4 



Phuong-Lan Nguyen and Bart Demoen 



Some Forms of Representation Sharing for Prolog 

Prolog implementations already provide some specific representation sharing. Here are a 
few examples: 

• in older implementations, the predicate copyJerm/2 copies ground terms; in 
newer implementations — starting probably with SICStus Prolog (Carl sson 1990l l — 
copyjerm/2 avoids copying ground (sub)terms; this means that the second argument 
can have some representation sharing with the first argument; however, note that mu- 
table ground terms must be copied by copyJerm/2, because otherwise sharing would 
become observable at the program level; we discuss this issue further in Section |9] 

• some programs contain ground terms at the source level; a typical example is 
the second argument of a goal like member( Assoc, [fx,fy,xfx,xjy,yfx,xf,yf]); ECLiPSe 
(I Wallace et al. 199"7] l pre-allocates such ground terms, and makes sure that any time 
such a fact or goal is called, the ground term is re-used; Mercury performs this 
compile-time optimization as well 

• when two terms are unified, they can share a common representation in the forward 
execution; at various stages in its life, BinProlog (Tarau 199 11 1 enforced such shar- 
ing by (in WAM speak) redirecting the S-tagged pointer of one of the two terms and 
(conditionally) trailing this change so that on backtracking it can be undone; if trail- 
ing is not needed, then the savings can be huge; otherwise, the locality of access can 
be improved, but memory and time savings can be negative; a similar technique was 
already used for strings only in the Logix implementation of Flat Concurrent Prolog 
jHirsch et al. 1987i i 

In each of the above cases, the implementor of the Prolog system decided for more 
representation sharing than would be the case in a more straightforward implementation. 
Application programmers and library developers usually take care as well to let their run- 
time data structures share common parts. 

In the above, copyJenn/2 and unification are built-in predicates that have a chance to 
increase representation shaiing. In Section |3]^nc/aW/5 is added to this shortlist. 



3 Input Sharing for findall/3 

In dMarien and Demoen 1993l l. the notion of input sharing was introduced in the context 
of findall/3. Input sharing consists of a solution in the output from findall/3 (its third argu- 
ment) sharing with the input to findall/3 (its second argument). 

Later, in the Logic Programming Pearl (lO'Keefe 2001) it is suggested that findall/3 
could avoid repeatedly copying the same terms over and over again: this would improve 
the space complexity of some queries that use findall/3, from 0{n^) to 0{n). However, R. 
O'Keefe suggests that hash-consing should be used, with the consequence that the time 
complexity remains the same: our implementation of input sharing — which is exactly 
what is needed here — improves both the time and space complexity. The example used 
in (lO'Keefe 20011 ) is rather complicated, so for now, we use as an illustration a piece of 



simple Prolog code that was posted in ( Grimley-Evans 1995 1; we changed the names of 
the predicates and variables. 
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findall_tails (L, Tails) :- findall (Tail, is_tail (L, Tail) , Tails) . 
is_tail (L, L) . 

is_tail([_|R],L) :- is_tail (R,L) . 
all_tails([], [[]]). 

all_tails(L, [LIS] ) :- L = [_|R], all_tails (R, S) . 



Clearly, goals of the form ?- finclalLtails(L,Tails). and ?- alLtails(L,Tails). with a ground 
argument L succeed with the same answer Tails. E.g., 



?- findall_tails([l,2,3],Tails) . 
Tails = [[1,2,3], [2,3], [3], []] 



The usual implementation of findall/3 copies over and over again parts of the input list L, 
and this results in quadratic behavior (in the length of L) for findall Jails/2, while all Jails/2 
is linear, both in space and time! Clearly, with enough input sharing the findall Jails/2 
query could be linear. 

In the following sections we show how a traditional findall/3 implementation in the 
context of the WAJVl can be easily adapted to cater for input sharing. An alternative copy- 
once implementation of findall/3 is also shown. 

Before going into the details, it is worth pointing out the limitations of input sharing. 
Clearly, if L is a list with non-ground elements, the two queries 



?- findall_tails (L, Tails) . ?- all_tails (L, Tails) . 



yield different answers. The first query makes fresh variants of the variables in each of the 
solutions in Tails, while the second query does not. As an example: 



?- findall_tails([X,Y,Z],L), 


7- 


all_tails([X,Y,Z],L), 


numbervars (L, 0,_) . 




numbervars (L, 0,_) . 




X 


= A Y = B Z = C 


L = [[A,B,C], [D,E], [F], []] 


L 


= [[A,B,C], [B,C], [C], []] 



This means we can use an input sharing version of findall/3 when the arguments of the 
generator are either ground or free: the danger is only in terms containing variables. 



3. 1 The Implementation of imdall/3 

The hProlog implementation of findall/3 follows the same pattem as in many systems: 



6 



Phuong-Lan Nguyen and Bart Demoen 



f indall (Template, Generator, SolList) : - 
findall_init (Handle) , 
( 

call (Generator) , 

findall_add (Template, Handle) , 

fail 

r 

f indall_get_solutions (SolList, Handle) 

) . 



For simplicity, we have left out all error checking and error recovery code. The predicate 
findallJnit/1 returns a handle, so that the particular invocation of findall/3 is identified: this 
is used for correct treatment of nested calls to findall/3. findalljadd/2 uses that handle, and 
copies the Template to a temporary zone. findalLget^olutions/2 uses the handle as well: 
it retrieves the complete list of solutions from the temporary zone and unifies it with the 
third argument to findall/3. 

The next section describes how to turn this code into code that shares the input. 

3.2 The basic Idea of Input Sharing for imdalI/3 

The predicate findalljadd/2 in our implementation of findall/3 is just a version of 
copyJerm/2: at the implementation level, they both use the same C function for the ac- 
tual copying. The same is true fox findall^etjiolutions/2. 

The first idea might be to use an implementation of copyJerm/2 that avoids copying 
ground terms. However, in the context of findall/3, groundness is not enough: the ground 
term must also be old enough, so that backtracking (over the Generator) cannot alter it. 
To be more precise, anything ground that survives backtracking over the Generator need 
not be copied by findalljadd/2. Or put still another way: anything ground before the call to 
findall(Template,Generator,SolList) need not be copied by findalljxdd/2. 

Such terms can be recognized easily: their root resides in a heap segment that is not 
younger than the call to findall/3. 

So we need to be able to identify the older heap part relevant to a particular call to 
findall/3. That is quite easy in the WAM: we just remember the relevant heap pointer! 

3.3 imdall/3 with Input Sharing: the Implementation 

We use two new low-level built-in predicates: 

• current_heap_top(-): unifies the argument with (an abstraction of) the current value 
of the heap pointer H 

• set_copy_heap_barrier(+): sets a global (C-)implementation variable (named 
copyJieapbarrier) to the heap pointer value corresponding to its argument 

The following code shows how the new built-ins are used: 
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sharing_f indall (Template, Generator, SolList ) : - 
current_heap_top (Barrier) , 
f indall_init (Handle) , 
( 

call (Generator) , 
set_copy_heap_barrier (Barrier) , 
f indall_add (Template, Handle) , 
fail 

set_copy_heap_barrier (Barrier) , 

f indall_get_solutions (SolList, Handle) 

) . 



An additional small change needs to be made to the implementation of findallMdd/2 
{sdA findall-get-Solutions/2) as well: when a term is about to be copied and it is older 
than copyJieapbarrier, only the root pointer to this term is copied. It amounts to adding a 
statement like 

if (struct_addr < copy_heapbarrier) { *whereto = struct_addr; continue; } 



at a few places in the C code of copy Jerm/2: this piece of code just copies the top pointer 
of the structured term instead of copying it recursively. The C variable struct jaddr holds 
the address of the structure about to be copied. 

For explanatory reasons, we have shown the implementation of sharing Jindall/3 as a 
variant of the basic implementation of findall/3 using two new built-ins. However, one can 
also fold the functionality of these new built-ins into adapted versions of findall_[init, add, 
get_solutions]: the top-of-heap at the moment of calling sharing Jindall/3 is then stored 
in the data structures belonging to that particular call. This top-of-heap at the moment of 
calling sharing Jindall/3 must also be appropriately treated by the garbage collector. 

3.4 An Example 

The heap and temporary findall zone are shown in Figure |2]for the very simple query 

?- findalKX, X=f(l,2,3), L) . 



The left part of the picture shows three snapshots during the execution of the query 
without input sharing. The right part shows the corresponding snapshots with input sharing. 
The snapshots are taken 

• just before findallMdd/2 is executed: the temporary zone is still empty 

• just aftei findall Mdd/2 is executed; at the left, the temporary zone contains a copy of 
the term f(l,2,3); at the right, there is a pointer to the term on the heap 

• just after findall^etjolutions/2 is executed: the temporary zone can be discarded; at 
the left, the solution list contains a copy of f(l,2,3); at the right, there is just a pointer 
to the old term on the heap 

The space savings are clear. 
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Fig. 2. Findall without and with Input Sharing 
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3.5 Copy -once flndall/3 

The usual implementation of findall/3 copies the solutions twice. BinProlog was probably 
the first implementation copying the solution only once, by means of a technique named 
heap lifting or more popularly a bubble in the heap dTarau 1992] l. Currently, the BinProlog 
implementation ( |Tarau and Majumdar 2009 1 relies on engines for findall/3 . Mercury also 
uses a copy-once findall (named solutions/2): as Mercury relies on the Boehm-coUector 
jBoehm and Weiser 1988l l. there is no memory management hassle with a bubble in the 
heap. 

It is rather easy to implement a copy-once findall/3 in any Prolog system that has 
non-backtrackable destructive assignment (with nbjetarg/3) as in hProlog or SWI-Prolog 
jWielemaker et al. 2008l F: 



^ Note that SWI-Prolog uses nbJinkarg/3 as the name for hProlog's nb^etarg/3 
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copy_once_f indall (Template, Generator, SolList ) : - 
Term = container ([]) , 
{ 

call (Generator) , 

Term = container (PartialSolList) , 
copy_term (Template, Y) , 
nb_setarg(l,Term, [Y | PartialSolList] ) , 
fail 

Term = container (FinalSolList) , 
reverse (FinalSolList, SolList) 

) . 



As before, this code can be enhanced with the newly introduced built-ins to yield a 
copy_once_sharing Jindall. If copying the solutions dominates the execution, the copy-once 
findall/3 is about twice as fast as the regular findall/3 . However, its main drawback is that 
it consumes (for the benchmarks below) about three times as much heap space. The reason 
is that nbsetarg/S must freeze the heap if its third argument is a compound term. The 
heap-lifting technique (which we have not implemented in hProlog) does not have this 
drawback. 



3.6 Experimental Evaluation 

Input sharing improves (sometimes) the complexity (space and time), and the constant 
overhead is really very small, as can be judged from the changes needed to implement 
it. One could therefore argue that benchmarks are not needed. Even so, we present two 
benchmarks: one is the findall Jails/2 example (see Section |3^ . We start with afindalI/3 
related query from (lO'Keefe 20011 l: this pearl is about tree construction and traversal. It 
contains the following text: 

Query requires at least 0{n^) space to hold the result. If findall/3 copied terms 
using some kind of hash consing, the space cost could be reduced to 0(n), but not 
the time cost, because it would still be necessary to test whether a newly generated 
solution could share structure with existing ones. 

Note that the n above is the number of nodes in the tree, not the tree depth: the number 
of nodes is roughly 4'^'^/"'' where depth is the depth of the tree. 

We needed to make a slight change to the program from (lO'Keefe 2001l l: in its original 
form it contains a mkJree/2 predicate defined as 



^ fl(N) in the Appendix 
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mk_tree(D, node(D,C)) :- 
( D > -> 

Dl is D - 1, 
C = [T,T,T,T] , 
mk_tree(Dl, T) 
; C = [] 



Because of the conjunction C = [T,T,T,T], mkJree(Dl, T), the heap representation of the 
constructed tree is Hnear in the first argument D, even though it has an exponential number 
of nodes: indeed, the constructed tree has a lot of internal sharing. Such internal sharing is 
retained by most reasonable implementations of copyJenn/2 and by findallMdd/2^ 

In order to test what (lO'Keefe 200 Ij really meant, we have changed the particular con- 
junction to 



C = [T1,T2,T3,T4 










mk_tree{Dl, Tl) , 


mk_ 


_tree 


(Dl, 


T2) , 


mk_tree{Dl, T3) , 


mk_ 


_tree 


(Dl, 


14) 



SO that the size of the representation of the tree is linear in the number of nodes in the 
tree (and exponential in D). This code rewrite achieves the desired effect because Prolog 
systems typically don't perform the analysis needed to notice that Tl, T2, T3 and T4 are 
declaratively the same value, and neither is this detected at runtime. See the Appendix for 
all code necessary to run the benchmark. 



3.6.1 The modified Tree Benchmark: Results 

Table [U shows timings (when considered meaningful) and space consumption for queries 
?- fl(Depth) with different values of Depth. Times are reported in milliseconds, space in 
bytes. We have chosen SICStus Prolog for comparison with another system because the 
SlCStus Prolog implementation performed better and more reliably than the other systems 
we tried. Moreover, the trend of the measurements with other systems was basically the 
same. 

The timings without sharing do not show anything interesting complexity-wise: neither 
of the implementations without input sharing can deal with more than about 5000 nodes. 
The input sharing implementation on the other hand can go easily up to one million nodes. 
The heap consumption columns give a good picture of how the heap size grows: the non- 
input-sharing implementations show a quadratic dependency of the heap consumption on 
the number of nodes. Only hProlog input sharing shows a linear dependency. 

Table[T]shows clearly that our simple implementation to enforce input sharing is very ef- 
fective and performs actually better than hoped for in (lO'Keefe 20011) . Indeed, we achieve 
linear space and time complexity for the fl (Depth) query. Hash-consing would not be able 
to do that. 



A notable exception is Yap. 
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Depth 


hProlog 


hProlog input sharing 


SICStus Prolog 


time 


space 


time 


space 


time 


space 


1 




964 




368 




980 


2 




11332 




1840 




10964 


3 




158020 




9776 




155348 


4 




2394436 




49712 




2379476 
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156 


37599556 




242224 


250 


37523156 


6 


2616 


598030660 




1 143344 


6820 


597659348 


7 








5272112 






8 






156 


23884336 






9 






652 


106721840 






10 






2804 


471626288 







Table 1 . Heap consumption in bytes and time in msecs for the tree benchmark 

3. 7 The tails Benchmark 

Table |2] shows the space consumption for the tails benchmark. The timings are meaning- 
lessly small for the variants with sharing, and therefore only shown for the regular findall 
columns. The Lengtli/lOOO column indicates the length of the ground input list L to queries 
of the form allJails(L,Tails) and [sharing_]findall(Tail,isJail(L,Tail),Tails). 



hProlog II SICStus Prolog 
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findall with 
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alLtails 


xngth 
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time 
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space 


time 
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26034 
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8 
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26034 
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444 


104068 


16 


16 
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24 
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416136 


32 


32 
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32 
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40 


3650 


650170 


40 
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48 


48 


5080 


936204 


48 


7 






56 


56 






56 


8 






64 


64 






64 


9 






72 


72 






72 


10 






80 


80 






80 


100 






800 


800 






800 


1000 






8000 


8000 






8000 



Table 2. Heap consumption in KiB and time in msecs - tails benchmark 

findall/3 with input sharing clearly beats the findall/3 without input sharing. SICStus 
Prolog can do larger sizes with the ordinary findall/3 implementation than hProlog: the lat- 
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ter runs out of memory earlier because of its different memory allocation and heap garbage 
collection policy. 

We have tried to measure the overhead of our method, but it is too small to show up 
meaningfully in any of our experiments. 



3.8 Conclusion on Input Sharing for flndall/3 

Already in ( |Grimley-Evans 1995| l there was a demand for sharing between the input to 
findall/3 and its output. Also (lO'Keefe 20011) points out that this would be beneficial to 
some programs. Optimal input sharing would attempt to share all (sub)terms that are 
ground just before the call to findall/3. Checking this at runtime can be involved and 
costly. Our implementation approximates that by just checking that the root of a term is 
old enough, and relying on the programmer (or some other means) to use sharing_findall/3 
only when this simple check implies that the whole term was ground at the moment of the 
call to sharing Jindall/3 . This is in particular true in the common case that the generator of 
findall/3 (its second argument) is a goal of which every argument is ground or free: that was 
the case for our benchmark findall Jails/2 . That condition on the generator can be easily 
checked before calling sharing Jindall/3 and could also be derived by program analysis. 

Our approach does not implement solution sharing: hash-consing, or maybe even better 
tries, could do the job. Sections|4]and later provide a more general and lightweight solution 
to representation sharing. 

In ( lO'Keefe 20011) . one can also read: 

One referee suggested that Mercury's 'solutions/2' would be cleverer. A test in the 
0.10 release showed that it is not yet clever enough. 



As Mercury (Somogyi et al. 1996 relies on the Boehm-allocator and -collector for its 
memory management, it is quite difficult to devise a simple dynamic test whether a (ground) 
term is old enough: on the whole, a Mercury implementation does indeed not benefit from 
keeping the address order of terms consistent with their age. On the other hand, in the 
WAM, such a test comes natural with the needs of a strict heap allocation discipline and 
conditional trailing. 

As a conclusion, we think we have succeeded in providing input sharing for findall/3 
with minimal change to the underlying Prolog execution engine: any Prolog implementa- 
tion with a heap allocation strategy similar to the WAM can incorporate it easily. How to 
present the functionality in a safe way to the user is a language design issue and as such 
beyond the scope of this paper. 



4 General Representation Sharing for Prolog 

( |Appel and Gongalves 1993] ) adapts a copying collector to perform hash-consing for the 
data in the older generation. Since we would like our implementation of representation 
sharer to be a model for other Prolog implementations, we cannot just copy that idea. In- 
deed, hash-consing requires a serious adaptation of the term representation, and moreover 
Prolog systems typically have sliding collectors, the exceptions being hProlog and BinPro- 
log. Therefore we want to investigate representation sharing in a way that does not require 
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a change in term representation, and that is independent of the details of the garbage col- 
lector: this will make it easier for Prolog systems to implement their own sharing module 
based on our experience. ( Appel and Gongalves 1993} argues that garbage collection time 
is a good moment to perform hash-consing, but there is no inherent need to do it only then. 
Still, we agree basically with ( Appel and Gongalves 1993| ): it is better to avoid putting any 
effort in sharing with dead terms. 

We use as Prolog goals in the examples share and gc: the former performs represen- 
tation sharing, the latter just performs garbage collection. By keeping the two separated, 
the issues become clearer, i.e., we make no assumptions on the workings of the garbage 
collector. 

dBaker 1992l l shows that the combination of tabling and hash-consing is particularly 
powerful: since duplicate terms do not occur, equality of terms can be decided by a single 
pointer comparison instead of by traversing the whole terms. However, in that context and 
in its original form, hash-consing guarantees representation sharing all the time, while that 
is not our aim. Unfortunately, (IBaker 1992) does not show experimental data for hash- 
consing without tabling. 



5 Representation Sharing in Prolog: Examples 

Two issues make representation sharing in Prolog-like languages different from other lan- 
guages: the logical variable and backtracking. Subsequent subsections show by example 
how these affect the possibilities for representation sharing. 



5.7 Sharing within the same Segment 

The first example in Section|2]shows the simplest case of sharing: the two terms are iden- 
tical, in the same heap segment (as delimited by the HB pointers in the choicepoints) and 
ground at creation time. 

The next example shows that identical ground terms in the same segment cannot always 
share their representation: 



main3 :- 




Tl = f 


(a), 


T2 = f 


(X), 


( 

X = a 


share 


write 
) . 


(Tl \== T2) 



While executing the query ?- main3, just before the execution of share, the terms Tl and 
T2 are identical, ground, and they are completely within the same segment. However, it 
would be wrong to make them share their representation, since in the failure continuation, 
they are no longer identical. Loosely speaking, the occurrence of trailed variables in a term 
makes the term unsuitable for representation sharing. 
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5.2 Sharing between Segments 

The previous examples dealt with representation sharing of terms that live in the same 
segment. The next example shows an issue with representation sharing of terms that live in 
different segments. Since we do not want to mix this issue with trailed heap locations, the 
example works with ground terms. 



main4 :- 




Tl = f(a), 




( 

T2 = f(a) 


share, use(Tl,T2) 


use (Tl) 
) . 





During the execution of the query ?- main4, Tl and T2 live in two different segments. 
Tl lives in the oldest segment, as seen in the left of Figure [jfl Since Tl is used after 
backtracking, the natural thing is to keep the representation of the oldest term, because it 
potentially lives longest. So the introduced sharing representation is as in the right of Figure 
[3] Alternatively, one could use as shared representation the one in the younger segment. 




Fig. 3. Representation Sharing of two Terms in different Segments 



but then the heap should be frozen, so that on backtracking the value of Tl does not get 
lost. We consider this a bad alternative, but a slight variation on the same example shows 
that the choice is not so clear cut: 



mainS :- 


main6 




Tl = f(a), 




Tl = f(a), 


use (Tl) , 




use{Tl) , 


( 

T2 = f(a), share, gc, use(T2) 




( 

T2 = f (a) , gc, use(T2) 


} 

dontuseTl 
) . 




dontuseTl 
) . 



The code of mainS and main6 differs only in the call to share in mainS. 

• with sharing in mainS: share keeps one representation of f(a) and puts it in the 
oldest segment; gc cannot reclaim that representation, because T2 is not dead; after 
backtracking to dontuseTl, the f(a) term is still on the heap 



^ The dashed Une indicates the heap segment barrier 
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• without sharing in niain6: at the point gc kicks in, Tl is unreachable and its rep- 
resentation disappears; this means that after backtracking to dontuseTl, the heap is 
empty 

This example shows that representation sharing between terms in different segments can 
lead to a higher heap consumption, or more invocations of the garbage collection. 

Finally, it is clear that mutable terms should not share their representation: it is in gen- 
eral impossible to know whether two mutable terms will be identical for the rest of their 
common lifetime. We deal with mutable terms in more detail in Section|9l 

6 Sharable Terms and Absorption 

The examples in the previous section give some intuition on what we mean by represen- 
tation sharing, and also about its pitfalls. The examples also have indicated that we are 
working towards an implementation of a sharer that introduces sharing between two terms 
Tl and T2 by keeping the representation of one of them, say Tl, and making T2 point to 
it. We coin this process Tl absorbs T2. This leads naturally to considering the notion Tl 
can absorb T2. 

The most general definition of Tl can absorb T2 would be that the sequence of solutions 
to the running program does not change by letting Tl absorb T2 . That condition is of course 
not decidable, so we need a workable approximation to it. 

The next sections explore the notion can absorb further, first by focussing on represen- 
tation sharing for individual heap cells and then by considering compound terms. 

6.1 Representation Sharing for Individual Heap Cells 

It pays off to study the most basic representation sharing of all; between two individual 
heap cells. 

Clearly, when two cells, say cl and c2, have different contents (and are live), neither 
of them can absorb the other And when the two cells have identical addresses, they have 
absorbed each other already. So, we are left with the possibilities that 

• cl and c2 are in the same heap segment or not 

• cl and/or c2 is trailed or not 

Without loss of generality, we assume that cl is older than c2. 

This results in the eight combinations shown in Figure |4l a trailed cell is shaded. The 
contents of the two cells at the moment of the snapshot is the same, but shaded cells will 
be set to free (a self-reference in the WAM) on backtracking to the appropriate choice- 
point. The horizontal dashed lines now indicate one or more heap segment separations. 
The vertical lines just separate the different cases. 

a: c 1 can absorb c2 and also vice versa, because the two cells have an identical contents, 
and that will remain so in the forward and in the backward computation 
bed; in the forward computation, the two cells remain identical, but not after backtrack- 
ing; so no representation sharing can take place, and neither can absorb the other 
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Fig. 4. The 8 combinations of two cells 



e: on backtracking, c2 dies before the older cell c 1 , but for the duration of their common 
life, the two cells are identical, so representation sharing is allowed: cl can absorb 

c2, but not the other way around 

fg: these cases are similar to cases bf be above: as soon as one of the trailed cells is 
untrailed by backtracking, the contents of cl and c2 differ; therefore representation 
sharing is not allowed; neither can absorb the other 

h: there are two possibihties now: 

(a) at the moment the older cell is untrailed, backtracking also recovers the seg- 
ment in which the newer cell resides; this means that the newer cell dies, so 
the fact that the older cell is set to free does not prevent representation shar- 
ing; so cl can absorb c2 (and not the other way around); this happens if cl was 
trailed before the segment of c2 is final, or to put it differently: if the moment 
of trailing cl is not after the segment of c2 is closed by a choicepoint; i.e., if 
c2 dies not later than cl is untrailed, cl can absorb c2 

(b) otherwise, representation sharing is disallowed; neither cell can absorb the 
other 

Anticipating an implementation, we notice that it is important to be able to check quickly 
whether a cell is trailed. One bit — appropriately placed — is enough for that: that bit could 
be in the heap cells themselves, or it could be allocated in an array parallel to the heap. 
This would make cases a and e easy to identify. 

To detect case h(a) however, we also need to retrieve quickly from a heap address, the 
heap segment number in which it was trailed. That requires more setup, and it would slow 
down the sharer We think the expected gain in space too small to make this worthwhile. 
Instead, we went for disallowing sharing in case h(a), so that our notion of can absorb 
becomes quite simple and leads to a simple decision procedure. In the following piece of 
code, pel and pc2 are pointers to heap cells cl and c2: 
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boolean can_absorb (cell *pcl, cell *pc2) 
{ 

if (*pcl != *pc2) return (FALSE) ; 
if (trailed(pcl) ) return (FALSE) ; 
if (trailed(pc2) ) return (FALSE) ; 
return (pel < pc2) ; 

} 



If cell cl can absorb cell c2, every (tagged) pointer to c2 can be changed into a (tagged) 
pointer to cl: this change does not affect the outcome of the execution. Note that it is 
immaterial whether the cell containing the (tagged) pointer to c2 is trailed or not. 

Since trailing prevents a cell from being able to absorb, or being absorbed, it is in the 
interest of maximizing the chances for representation sharing to keep the trail tidy: this is 
in many Prolog systems done at the moment a cut iJ/O) is executed. Also during garbage 
collection, the trail can be tidied. 

6.2 Representation Sharing for Compound Terms 

The representation of a compound term with principal functor/oo/n in the WAM is an S- 
tagged pointer to an array of (n+1) contiguous heap cells, the first of which contains /oo/n, 
and the next n cells contain one cell of the representation of one argument each. We name 
this array of (n+1) heap cells the body of the term. 

The idea of one term absorbing the other is that after absorption, there is only one body 
instead of two, but there are still two cells with an S-tagged pointer pointing to it. See 
Figure |5] 

Clearly a necessary condition for such representation sharing is that the two bodies have 
the same contents. Moreover, since a term body always belongs to a single segment, the 
condition worked out for absorption for two individual cells must hold for each pair of 
corresponding body elements. We arrive at the following 



Definition: Term Tl can absorb term T2 if Tl is older than T2, Tl == T2 and neither Tl 
nor T2 contain trailed cells. 

Figure|5]shows two bodies that fulfill the conditions. 




Fig. 5. Left: the bodies fulfill the conditions for representation sharing. 
Right: sharing has been performed 

Note that the example exhibits a situation we have not yet described: a variable chain. 
Dereferencing must be stopped when a trailed cell is found. 
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Note the similarity of the above analysis with the one for variable shunting in (ISahlin and Carlsson 199 It . 
An algorithm that given two terms decides whether one can absorb the other is now 
easily constructed. However, the naive use of this algorithm would be very inefficient. 



6.3 Properties of our notion of can absorb 

Before going to the implementation of representation sharing, it is good to understand some 
properties of the can absorb relation: the optimality (if any) of our algorithms depends 
crucially on those properties. 

It is clear that can absorb is not symmetric: a newer term cannot absorb an older term 
in a different segment. Neither is can absorb anti-symmetric: case a in Section \6A] shows 
that. 

We denote by absorbed(x,y) the result of letting term x absorb term y, of course under 
the condition that x can absorb y. 

An important part of our definition of can absorb is that the terms do not contain trailed 
cells: it implies that a candidate term for absorbing or being absorbed can be recognized 
without knowing the other term, i.e., one checks whether it contains trailed cells or not and 
by keeping information about visited terms, one can assure that this information about the 
terms can be gathered in time proportional to the heap. 

From the definition, it also follows that can absorb is transitive: 

(x can absorb y) S& (y can absorb z) ==> x can absorb z 



Finally, the absorption process is also associative, i.e., 

absorbed (absorbed (x, y) , z) == absorbed (x, absorbed (y, z) ) 



(of course under the condition that x can absorb y and y can absorb z). This means that the 
order in which absorption takes place is immaterial: the end result is the same. 

Together, these properties allow for a basically linear sharing algorithm, on condition 
that term hashing is perfect. With a less than perfect hash function, the algorithm might 
need to traverse some terms more than once. 



7 Implementation of Representation Sharing 

We have taken hProlog as the platform for an implementation of representation sharing. 
hProlog is based on the WAM dAit-Kaci 1991tlWarren 19831 1 with a few differences: 

• the choicepoint stack and environment stack are not interleaved as in the WAM, but 
separate stacks as in SICStus Prolog 

• free variables only reside on the heap; i.e., there are no self-references in the envi- 
ronment stack, just as in Aquarius Prolog ( Van Roy and Despain 1992| 



• hProlog supports some more native types like char, string and bigint; it also has 
attributed variables 
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hProlog employs a mark-and-copy type of garbage collector, with its roots in ( Bevemyr and Lindgren 1 994 



and it preserves segment order as described in ( Vandeginste et al. 2002[ l. Most other sys- 



tems use a sliding collector based on ( Appleby et al. 1988) l. hProlog does not implement 
variable shunting. 



hProlog is a direct descendant of dProlog (Demoen and Nguyen 2000|. Its purpose is to 



offer a platform for experiments in WAM-like Prolog implementation. Its high performance 
gives the experiments an extra dimension of credibility. 

The implementation uses two data structures: they can be seen in Figure |6] We name 
them cachedJiash table and hashed Jerms table. Together they form the sharer tables. 

• cachedJiash: this is an array the size of the WAM heap (or global stack) and can 
be though of as parallel to the heap; its entries contain information about the corre- 
sponding heap cells; the information is one of the following three: 

— no-info: the corresponding heap cell has not been treated yet 

— impossible: the corresponding heap cell cannot participate in representation 
sharing; see Section l73] for more on this 

— a pointer to the hashedJerms table: the corresponding heap cell has been 
treated, and its sharing information can be found by following the pointer 

• hashedJerms: this data structure contains records with two fields: hashvalue and 
term; suppose a pointer in the cachedJiash points to a record in the hashedJerms, 
and the corresponding heap cell A is the entry point of term TermA, then 

— the hashvalue field in the record is the hash value of term TermA 

— the term field in the record is a pointer to a heap cell B that is the entry point 
of a term TermB that can absorb TermA (provided A and B are the not same 
cell); our implementation makes sure that the heap cell B is as old as possible, 
i.e., B is equal to A or older than A 

Treating a heap cell consists in filling out the corresponding cell in the cachedJiash table 
and possibly the hashedjerms table. 

The implementation of the hashedJerms table is actually as a hash table: the hash- 
value of a term modulo the size of the hash table is used for determining the place in 
the hashed_terms, and a linked list of buckets is used to resolve colhsions. Many other 
implementations of this hashed_terms table would be fine as well. 

Our first description of the algorithm only tries to introduce sharing between structures 
(not lists). Therefore, for now, hashed_terms pointers can only appear in cachedJiash cells 
corresponding to a heap cell containing a functor descriptor 

The main algorithm consists of two phases: 

• build: it builds the cachedJiash and hashedJerms tables; during this phase nothing 
is changed to the heap; this phase treats all heap cells 

• absorb: it performs all absorption possible by using the cachedJiash and hashedJerms 
tables 

In the algorithms below, we use beginheap and endheap for the pointers to the first 
(oldest) cell in the heap and the last (newest). We assume no cell is trailed, and come back 
to this point later. 
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7.1 Phase I: building the cached Jtash and hashed Jerms tables 

The build phase performs the action compute J^ash for each cell in the heap: the cor- 
responding cell in the cachedJiash is set to either impossible or to a pointer to the 
hashedjerms. The function compute Jiash is always called with a tagged term as argument. 
In the code below, we use STRUCT as the tag of a pointer pointing to the functor cell of a 
compound term. In figures, this tag shows simply as S. The function call tag(p, STRUCT) 
returns such a STRUCT tagged pointer; the function untag has the oposite effect. A func- 
tion call Uke tag(term) returns the tag of its argument. 

Note that the following code ignores certain issues like checking whether a cell is trailed, 
and LISTS. We deal with them in SectionlTj] 



foreach p in [beginheap, endheap] SS is_functor (*p) 

compute_hash (tag (p, STRUCT) ) ; // ignore return value 

int compute_hash (p) 
{ 

deref (p) ; 
switch tag(p) 
{ 

case FREE: 
case ATOMIC: 

return (p) ; 

case STRUCT: 

p = untag (p, STRUCT); 

if {already_computed (p) ) return (already_computed_hash (p) ) ; 
hashvalue = *p; 

foreach argument of structure p do 

hashvalue += compute_hash (argument) ; 
save_hash (hashvalue, p) ; 
return (hashvalue) ; 

} 

} 



The particular hash value computed above is not relevant for our discussion: in practice, 
there are better (more complicated) ways to compute hash values of terms. 

The function call already jcomputed(p) checks whether the corresponding element in 
the cachedJiash table points to the hashedJerms table, already jcomputedJiash(p) returns 
the hash value previously computed (for the term starting at p) from the hashed_terms 
entry corresponding to p: in this way, re-computation (and re-traversal of the same term) is 
avoided. 

In the savejiash function that follows, we have left out collision handling: for the sake 
of the presentation, we assume perfect hashing. 
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save_hash (hashvalue, p) 
{ 

index = hashvalue % length (hashed_terms) ; 

cached_hash [p-beginheap] = hashed_terms + index; 

if (empty (hashed_terms [index] ) ) 
{ hashed_terms [index] .term = p; 

hashed_terms [index] . hashvalue = hashvalue; 

return; 

} 

// a non-empty entry might need to be adapted 

if newer (hashed_terms [index] .term, p) hashed_terms [index] . term = p; 

} 



The last line in saveJiash makes sure that the term pointed at in an hashedJerms entry 
is as old as possible. The reason is that it is safe to let an older term absorb a younger one. 

Figure |6] shows how three equal terms are treated by compute Jiash and the effect thereof 
on the cachedJiash and hashedJerms tables. 




(a) After treating middle f(a,b) (b) After treating younger f(a,b) (c) After treating older f(a,b) 

Fig. 6. Three identical terms are treated during the build phase 



7.2 Phase II: Absorbing 

The absorb phase performs the actual representation sharing: an S-tagged pointer is redi- 
rected to the oldest term body that can absorb it. The code is very simple; 
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foreach cell c in the heap 




in the local stack 




in the choicepoint stack 




in the argument registers do 




let p be the contents of c; 




if (tag(p) == STRUCT) 




{ 

q = untag(p, STRUCT) ; 




if (cached_hash [q-beginheap] points 


to hashed_terms) 


replace c by tag (cached_hash 

} 


q-beginheap] ->term, STRUCT) ; 



Figure |7] shows how the older term absorbs the two identical younger terms. 
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(a) Just before absorbing 

Fig. 7. Absorption and GC in action 



HEAP 

(b) After absorbing 



f/2 
a 
b 



HEAP 

(c) After one more GC 



7.3 Comments on the Code 
The code in Section lTTTI ignores certain issues: 

• checking whether a heap cell is trailed: during the initialization of the build phase, 
the cachedJiash table entries corresponding to trailed heap entries are initialized to 
impossible; this requires traversing the trail once and it makes checking whether a 
cell is trailed constant time; the checks whether a heap cell is trailed are required 
during the dereferencing loop; when a trailed cell is encountered, the computation 
of the hash value is stopped and the corresponding cachedJiash table entries of the 
term containing the trailed cell are also set to impossible 

• other datatypes: the code takes into account only non-list structured terms, atoms 
and variables; it is easy to extend it to other types that occupy a single cell; for other 
atomic types (real, string, bigint) we have followed the same principle as for non-list 
structured terms: those types are implemented roughly like such terms, i.e., with a 
tagged pointer to a header on the heap which is followed by the actual value that can 
span several heap cells; for Usts, we have a different solution: see Section l74l 
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• foreach: our implementation uses a linear scan for the foreach constructs: this is 
possible for all the stacks in hProlog; if this is not the case, one can traverse the live 
data starting from the root set as the garbage collector does (e.g., during its marking 
phase) 

The code implementing the above is less than 700 lines of plain C that reuses very little 
previously existing code. 

Note that in the context of our copying collector, the extra space needed for representa- 
tion sharing is just the hashed_terms table: the cachedJiash table has exactly the same size 
as the collector needs for performing its collector duties. 



7.4 Representation Sharing of Lists 

In the WAM, lists have no header like other compound terms. A list is represented by an 
L-tagged pointer to two consecutive heap cells containing the first element of the list and 
its tail respectively. Clearly, we cannot deal with lists as in the previous algorithm. The 
change is however small: we keep the hashedJerms pointer in the cell corresponding to 
the list-pointer. Figure|8]shows an example with just lists. 




hashedterms 



I3 



Fig. 



(a) Just before absorbing 

Absorption and GC in action 



(b) After absorbing (c) After one more GC 



Note that functor cells can only appear on the heap, while list pointers can occur 
also in environments, choicepoints, and the argument registers. As a result, with just a 
hashedJerms pointer array parallel to the heap, some representation sharing in the other 
stacks can get lost for lists. A similar hashedJerms pointer array parallel to the other stacks 
can solve this problem: our implementation does not do that. Another solution consists in 
using the cell of the first element of a list for keeping the corresponding hashedJerms 
information. We have not explored that alternative. 



7.5 When to run the Sharer 

It seems obvious that the sharer must be run either during GC, or just after GC. Our sharer 
can be adapted to run during GC most easily when the GC starts with a marking phase: the 
build phase of the sharer can indeed be integrated in the marking phase of the collector. 
The absorb phase can be run before the next GC phase, or be integrated with it. That would 
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lead to a (mark+build)&(copy+absorb) collector for hProlog. In a sliding GC context, this 
would become (mark+build)&(compact+absorb). 

Still, we choose from the beginning to run the sharer as an independent module that 
could actually be run at any time. Just after GC seems the best, because at that moment, 
the heap has minimal size. We name that policy after GC. 

There is one snag in this: the space freed by the sharer cannot be used immediately, 
and the beneficial effect of the sharer can be seen only after the next GC. Therefore, it 
feels like immediately after the sharer, another GC should be done. We name that policy 
between GC. 

We have therefore added an option to hProlog: 

• -rO: no sharing 

• -rl : sharer with policy after GC 

• -r2: sharer with policy between GC 

Note that the absorb phase could estimate the amount of space it has freed, and the 
decision to switch from one policy to the other could be based on that. 



8 The Benchmarks and the Results 



Since ( Appel and Gongalves 1993 1 is closest to our representation sharing, we are inclined 



to use the same benchmarks. However, ( Appel and Gongalves 1993 1 shows overall very 



little impact of hash-consing and unfortunately, the benchmarks were not analyzed so as to 
explain why hash-consing is not effective on them. On the other hand, one cannot a priori 
assume that our sharer will show the same behavior, because of the differences between 
our respective implementations, and even the language: 

• hProlog only performs major collections, while SML/NJ has a generational collector 
(with two generations) 

• our sharer does not alter the representation of terms, while ( Appel and Gongalves 1993| 



performs hash-consing (which entails a representation change) on the old generation 
only 

SML/NJ is a deterministic language and a boolean SML/NJ function is like a semi- 
det predicate in Prolog; however, in a typical Prolog implementation, the data it 
creates is (on failure) backtracked over in Prolog and the WAM recovers its space: 
this can have a huge impact on some benchmarks (the mandelbrot benchmark is an 
example) 



in ( Appel and Gongalves 1993 1 hash-consing was inseparably tied to the (genera- 
tional) collector; in contrast, we have explicitly aimed at keeping the collector and 
the sharer separated (we argue why in Section]?); this has an impact on the efficiency 
of the sharing process 



So it seems worthwhile to redo some of the benchmarks of ( Appel and Gongalves 1993^ 



The following section describes those benchmarks as well as some others not appearing in 



(Appel and Gongalves 1993 1. 
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8.1 The Benchmarks 

8.1.1 Boyer 

Boyer is a famous benchmark initially conceived by R. Gabriel for Lisp, and later used 
in other functional and logic contexts. Essentially, it rewrites a term to a canonical form. 
Boyer has been the subject of many studies, and in particular for proving that it is not a 
good benchmark: see for instance (IBaker 1992t . Anyway, in ( Appel and Gongalves 1993 1, 



this benchmark shows the best results for hash-consing. The inherent reason is that terms 
are rewritten to a canonical form and thus many initially different terms end up the same. 
We measured that the final result of the rewriting process needs 39 834 heap cells without 
representation sharing, and only about 200 with representation sharing. 

This makes boyer close to an optimal benchmark for showing the effectiveness of rep- 
resentation sharing. 

Note that the boyer benchmark also benefits a lot from tabling dChen and Warren 1996l l. 
This means that repeated computations are going on, which explains also the high amount 
of representation sharing. However, while tabling does avoid the repetition of duplicate 
computations, as usually implemented, it does not avoid the creation of duplicate terms on 
the heap. It is possible to add to the tries enough info so that ground terms need be copied 
only once to the heap as long as this copy is not backtracked over. 



8.1.2 Life 

(Appel and Gongalves 1993 1 also uses the well known Game of Life as a benchmark. We 



have written a version in Prolog following the ideas of Chris Reade dReade 19891 ). just as 



(Appel and Gongalves 1993 1 did. A (live) cell is represented as a tuple in coordinate form 
(X,Y). A generation is a list of live cells. The program keeps a list of the first 1000 genera- 
tions, starting from the The Weekender^ which is a glider, 
i.e., a pattern that repeats itself after a few generations (7 
in this case) translated a few cells (2 in this case). If just 
the most recent generation is kept alive, one expects little 
from running the sharer immediately after a major col- 
lection, as the just rewritten generation is garbage. Our 
benchmark still shows some 50% memory improvement, 
because it keeps all computed generations in a list, so 
that the existing overlap between generations is shared. 



(Appel and Gongalves 1993) shows little gain from 



hash consing for this benchmark, but we could not re- Fig. 9. The Weekender 
trieve the initial generation(s) on which the benchmark 
was run. 



* See |http://fano.ics.uci.edu/ca/rules/b3s23/gl0.html 
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8.1.3 Mandelbrot 

This benchmark was also used in (Appeland Gonfalves 1993)1: it computes (actually out- 



puts) a bitmap of a Mandelbrot set of a given dimension. Since the output does not play a 
role in the heap usage, we have removed the code for the output. We took the version from 
the Computer Language Benchmarks Game (http://shootout.alioth.debian.org/) written for 
Mercury and based on a version by Glendon Hoist. Mandelbrot uses quite a bit of heap and 
as such appears a good memory benchmark. However, one can see quickly that literally all 
memory used by mandelbrot is by floating point numbers: computed floating point num- 
bers have the tendency to be different and therefore representation sharing might not have 
much effect. We have indeed checked that half of the generated floating point numbers are 
unique during the benchmark. 

Almost all the floating point numbers are generated during a ground call to mandel/5, a 
semidet predicate called as the condition in an if-then-else as follows: 



(mandel (Height, Width, Y, X, 50) -> 
ByteOutl is (ByteOutO « 1) 

ByteOutl is (ByteOutO « 1) \/ 0x1 



In the setting of ( |Appel and Gongalves 1993} (generational collection + hash-consing) 
the Mandelbrot benchmark has the following characteristic: if the garbage collector runs 
during the test (mandel/5) then a few floats are copied to the older generation, otherwise, 
no float from the new generation survives the collection. So, not even all computed floats 
end up in the zone subject to hash-consing. 

In our setting (only major collections -i- representation sharing), at each collection, only 
some floats in the test are alive. Exactly at that moment, the chance for duplicates is very 
small. 

The effect of hash-consing or representation sharing is expected to be very small for the 
mandelbrot benchmark. 

Our test runs of the mandelbrot benchmark indeed show zero gain from representation 
sharing. 



8.1.4 One more classical Prolog benchmark: tsp 



We were unable to retrieve more benchmarks from (Appel and Gongalves 1993 1, so we 
tried different benchmarks from the established general Prolog benchmark suite. None 
showed any benefit from representation sharing. We report only on tsp: just like mandelbrot 
and the other benchmarks showing no benefit, it is mainly good for showing the overhead 
of the useless sharer 



8.1.5 blid/1 

The next program was altered slightly from what UWch Neumerkel posted in 
comp.lang.prolog; it appears also in his Diplomarbeit (INeumerkel 19891 ). 



Representation Sharing for Prolog 



11 



blid{N) :- 


blam([]) . 


length (L, N) , 


blam([L|L]) :- 


blam(L) , 


blam(L) . 


id(L,K), 




use(K) . 


id([], []). 

id([Ll|Rl], [L2|R2]) :- 

id(Ll,L2) , % LI = L2 


use (_) . 


id(Rl,R2) . % Rl = R2 



His question was Are there systems, that execute a goal blid(N) in space proportional to 
N? Say blid(24). At first we expected that with our representation sharing, space would 
be indeed hnear in N. However, the expansion poUcy and order in which events (garbage 
collection and representation sharing) take place is also crucial. 

• with the after GC pohcy, the following happens: 

al : the first GC finds that 99% (or more) of the data is Uve, and decides to expand 

the heap 
a2: the sharer shares most data 

a3: the next triggered GC finds that about half of the heap is live, so does not 
expand 

a4: the following sharer shares most of the data 
a5: points a3 and a4 are repeated 

• with the between GC pohcy, the following happens: 

bl : the first GC finds that 99% (or more) of the data is five, and decides to expand 

the heap 
b2: the sharer shares most data 
b3: the second GC collects almost all data 
b4: points bl, b2 and b3 are repeated 

The first GC in al and bl is triggered by lack of space, the second GC (in b3) is there by 
policy. A GC can decide to expand the heap (in hProlog when the occupancy is more than 
75%: this is known after marking). So one sees that in the case of the between GC poUcy, 
the heap is repeatedly expanded, even though the program could run in constant space 
(with the aid of the sharer). With the after GC, we do not get into this repeated expansion. 

If hProlog also had a heap shrinking policy, the between GC policy would after its second 
collection shrink the heap, and this would amount to almost the same effect as the after GC 
policy. 

This shows that the combination of a reasonable heap expansion pohcy and a reasonable 
sharer policy can result in an overall bad policy. More work could be done on this. 

Note that in its original form, id/2 also contains the two commented out unifications, 
and that with unification factoring these would also introduce the sharing needed to run in 
0(N) heap (always with the aid of GC of course). 
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8.1.6 Four Applications 

The next four benchmarks provide some insight in what to expect from the sharer in a some 
typical appHcations of Prolog: there is little impact on memory and performance. 

Tree Learner This realistic benchmark consists of a best-first relational regression tree 



learner written by Bemd Gutmann (Gutmann and Kersting 2006 1. The program is about 



900 LOG. It works on a data set of 350K facts. 

Emul. Emul is a BAM emulator d Van Roy and Despain 19921 ) written by Peter Van 
Roy in Prolog. The benchmark consists in executing the BAM code for the famous 
SEND+MORE=MONEY pmhlem. It is about IK LOG. 



An XSB compiler xsbcomp is an old version of the XSB compiler (Sagonas etal. 1994i 
and a run of the benchmark consists in compiling itself. The XSB compiler is about 5K 
LOG and also uses the hProlog or SIGStus Prolog reader which are also in Prolog. 

The hProlog compiler In this benchmark, the hProlog compiler compiles itself. It uses 
setarg/3 heavily, so we need a mutable/1 (see Section |9|l declaration for 7 functors. This 
benchmark cannot be run by SIGStus. The hProlog compiler (which is a version of the 
hipP compiler jBlockeel et al. 2000J written by Henk Vandecasteele) plus all other code it 
needs (reader, optimizer ...) totals more than lOK LOG. 



5.7.7 Worst and best Case 

It is not clear what the best and worst case for our sharer is: if the heap were just one huge 
flat term (say of the form f(l,2,3,...)) then only one hash value would have to be saved in 
the hashedJerms table, and in some sense that is both best and worst, because the least time 
is lost in colhsions etc, but also no sharing can be performed. So we choose the following 
as best-versus-worst case: a large complete binary tree in which every node is of the form 
node(tree,tree,number). In what we consider the best case, the number is always the same 
(and thus resembles a bit the blid data structure in Section 18. LSI ). This leads to a very 
sparse hashed_terms table, and a large amount of sharing. In the worst case, the number 
is different in all nodes: as a result, the hashedJerms becomes quite full, and no sharing 
is possible at all. The main reason for this benchmark is to find out how the build and the 
absorb phase contribute to the total time of the sharer 



8.2 The Benchmark Results for Representation Sharing 

The results are shown in Table |3] Time is in milliseconds. Space is in Mib or Kib as indi- 
cated in the table. 

The first two columns denote the benchmark and system used (with the sharing option 
for hProlog). Then follow the total time taken by heap garbage collection (including the 
stack shifter), the total time taken by the sharing module, the total execution time and the 
number of garbage collections. Then follow four columns related to space: the initial heap 
size, the final heap size and the amount of space collected by the garbage collections are 



Representation Sharing for Prolog 



29 



given in megabytes. Finally, there is the heap high water mark at the end of the benchmark 
given in KiB instead of Mib because the figures vary widely. It measures the size of the 
result computed by the benchmark and it includes a small system specific overhead from 
the toplevel: for mandelbrot, the figure is just that overhead. 

Table |3] shows sometimes a large difference between the memory consumption of 
SICStus Prolog and hProlog. Also the time spent in garbage collection, and the number 
of collections can be very different. The reason is that although both systems are based on 
the WAM, they differ in a number of other design decision. In particular, their heap expan- 
sion policy differs, their garbage collectors differ (the SICStus Prolog one is generational 
and compacting, while the hProlog one is non-generational and copying), they have a dif- 
ferent approach to floating point arithmetic, and hProlog does not allocate free variables in 
the local stack. 

In addition to the results in Table [5] we can also mention that the build phase takes 
between 8.3 (for blid) and 2.6 (for worst) times as long as the absorb phase: the absorb 
phase is indeed much simpler 



8.3 Conclusions from the Benchmarks 

By and large our results confirm the findings of ( Appel and Gongalves 1993"] l: most bench- 
marks hardly benefit from representation sharing, and sometimes the space and time per- 
formance becomes worse. Apart from the artificial benchmark blid/1, only for boyer do we 



find a much larger — huge in fact — benefit from representation sharing than in ( Appel and Gongalves 1 993 i 
We have not been able to pinpoint why; the benchmarks used in ( Appel and Gongalves 1993| 
are not even available anymore, let alone the queries. The fact that generational collection 
retains terms longer than an only-major-collections strategy might play a role. Still, our 
result is in line with the (confluent) rewriting character of boyer 

The time taken by our implementation of representation sharing is reasonable: the algo- 
rithm is linear in the size of the heap, localstack, choicepointstack and trail, so complexity 
wise not worse than an actual garbage collection. The traversal of the stacks is however 
less complicated, since one does not need to take into account the liveness of the locations 
anymore and less copying is going on. In our application benchmarks, the sharer always 
takes less time than the garbage collection. It is clear that a better policy, and improvements 
to our implementation code, can make the sharer even more efficient. Our sharer does not 
depend on the efficiency of the underlying Prolog system, neither its garbage collector, 
so we feel it is safe to say that our sharer can be implemented with the same (or better) 
performance in other WAM-like systems. 



9 Variations, Extensions and related Issues 

Unusual Sharing. In (i Pemoen 2002l i. the rather unusual representation sharings depicted 
in Figure [TO]are described. 

Our current representation sharing implementation does not achieve the above sharings. 
Still, all ingredients are present and while the expected gains are small, it is nice that the 
above unusual sharing can be achieved in time linear in the size of the heap (assuming 
perfect hashing). 
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Table 3. The Sharer and the Collector 
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X=f(a,b), Y = [alb] 





f/3 




a 




b 




L 



f(a,b,[alb]) 




X=[alb],Y=[blc],Z=[cld] 



Fig. 10. Unusual representation sharing 



Cyclic Terms. (Appel and Gon9alves 1993 1 deals with cyclic terms by excluding them 



from hash-consing. It is easy to do the same in our implementation as follows: 

1 . besides the special values no-info and impossible, cachedjiash entry can also have 
the value busy 

2. when a functor cell is visited for the first time, that corresponding cachedJiash entry 
is set to busy 

3. when a functor cell is visited recursively, a check on the corresponding cachedjiash 
entry detects that there is a cycle: the field is set to impossible 

4. as usual, when a term is visited completely, its corresponding field is set to an ap- 
propriate value, i.e., impossible or a pointer to the hashedJerms table 

However, one can do better: a variation of point 3 above yields a procedure that can 
perform representation sharing also for cyclic terms. 

3' . when a functor cell is visited recursively, a check on the corresponding cachedJiash 
entry detects that there is a cycle: a fixed value (say 17) is returned as the hash value 
of this term; the corresponding cachedJiash entry is not updated at this time: this 
happens when the visit has returned to the point where the entry was set to busy 

The procedure for testing equality of terms must also be adapted to deal correctly with 
cycles: this is common practice now in most Prolog systems. 

Note that it does not matter which value is chosen in (3') above. What matters is only that 
the hash value of terms that can share their representation is the same. Still, our procedure 
can attach a different hash value to cyclic terms that are equal (in the sense of ==/2) and 
could share their representation. This results in no representation sharing for those cyclic 
terms. As an example: 



test :- X = f(l,f(l,X)), share, use (X) 



does not result in the same heap representation as 



test :- X = f (1,X) , use (X) . 



The procedure based on minimization of finite automata described in (INeumerkel 19891 1 
does. 
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Mutable Terms. Prolog systems supporting destructive update — through setarg/3, muta- 
ble terms or for attributed variables — often do this using a trail in which each entry keeps 
the old value: clearly, these old values can point to sharable terms and they can be updated 
accordingly in the final absorb phase. 

However, just as a ground mutable term must be copied by copyJerm/2, a mutable term 
itself is not allowed to absorb or be absorbed. This means that mutable terms should be rec- 
ognizable during the build phase. In SICStus Prolog this is the case ($mutable/2 is reserved 
for this), but not so in other systems (e.g., SWI Prolog, Yap, hProlog ...). In hProlog we 
have resolved that problem by introducing a declaration: . - mutable foo/3. declares that the 
arguments of my foo/3 term can be destructively updated, and effectively prevents sharing 
of foo/3 terms. We use one bit in the functor table and the overhead during the build phase 
is unnoticeable. Note that the . - mutable declaration does not readily work across modules. 

Cooperation between Collector and Sharing. We have implemented the representation 
sharing module independent of the garbage collector module. The advantage is less de- 
pendency and a higher potential that the sharer can be integrated in other systems. The 
disadvantage is that some information that the garbage collector has computed, needs to be 
recomputed by the sharing module. For instance, the collector might leave behind informa- 
tion on which cells are trailed, and which cells contain sharable information. This would 
speed up the sharer and in particular the buUd phase. 

What if Representation Sharing does not work. The benchmark programs show that 
representation sharing is not always effective: it depends indeed highly on the type of pro- 
gram. When representation sharing does not work, this can be noticed during a run of the 

representation sharing module by observing the hashedJerms. If it keeps growing, it means 
that lots of different terms are found. This in tum gives an indication that representation 
sharing is not effective. An important advantage of our implementation is that the represen- 
tation sharing process can be abandoned at any time since no changes to the WAM run-time 
data structures are made until the absorb phase in which structure (or list ...) pointers are 
updated, and even the absorb phase can be stopped before finishing. Also, if representation 
sharing is run from time to time only — as suggested by Ulrich Neumerkel — then the fre- 
quency of running it can take into account the effectiveness of representation sharing up 
to that moment. Such tuning could depend also on the relative performance of the garbage 
collector and the representation sharing module. 

Parallelization. During the scanning phase, the stacks (heap, local stack ...) are read-only, 
while the cachedJiash and the hashedJerms can be read and written by different workers. 
During the absorb phase, the cachedJiash and hashed_terms are read-only, and only the 
stacks are written to. 

By giving different workers a different part of the heap to start working on, duplicate 
work might be avoided and synchronization slowdown kept low in the scanning phase. 
During the absorb phase, giving different workers different parts of the stacks makes their 
actions completely independent. 
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Variable Chains. We have not treated variable chains in much detail, as we were mostly 
interested in sharing between the bodies of compound data. However, a slight extension of 
the code for the build phase can also call saveJiash for all reference cells. That results in 
a similar effect as variable shunting as described in (ISahlin andCarlsson 1991| l. but is not 
as complete as the method described there. Figure [TT]shows an example of how a chain of 
references is transformed. 



ref 
ref 



b 

HEAP 




hashedterms 



b 

HEAP 



(a) Just before absorbing 

Fig. 1 1 . Absorption for chains of references in action 



(b) After absorbing 



Backtrackable Representation Sharing. Backtrackable representation sharing would fol- 
low the principle that when two terms are identical (as for ==/2) then one can absorb the 
other, regardless of whether they have trailed cells or not. The change made (to a LIST 
or STRUCT-tagged pointer) by the absorb phase is now conditionally (and value) trailed. 
This costs extra trail space of course. On cut, the trail can be tidied, so in case the computa- 
tion becomes eventually deterministic, the amount of sharing can be arbitrarily larger than 
without this form of backtrackable representation sharing. However, suppose that all shar- 
ing were trailed, then it is possible that an immediately following GC would not be able 
to recover anything. And if the computation becomes deterministic eventually, running 
the sharer will do the same job as was done in the case of the backtrackable representa- 
tion sharing, only later - which might be even better, because the earlier sharing could 
have been unnecessary because backtracking has destroyed it. All in all, our feeling is that 
backtrackable representation sharing is not worth its while. 

Partial Sharing. Partial sharing refers to running the sharer in an incomplete way, i.e., it 
achieves part of its potential effect, but maybe not all. 

Partial sharing can result for instance from restricting the part of the heap in which 
duplicate terms are identified, i.e., restricting the scan phase to part of the heap. Another 
possibility is to restrict sharing to certain terms, e.g., just for lists, or to certain parts of 
the other stacks. It is one of the strengths of our implementation approach that all such 
variations can be incorporated rather easily. 

Incremental Sharing. The notion of incremental sharing refers to the possibility to per- 
form a partial sharer pass, e.g., on part of the heap, and continue that pass later on, even- 
tually obtaining the same effect as running the sharer completely. The abihty to perform 
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partial sharing is certainly needed, but there is more: information must be passed from one 
partial run to the other, and the user program and the sharer must be able to run in an inter- 
leaved way. This raises immediately question of the completeness, but also efficiency is at 
stake. 

The issues with incremental sharing are similar to the ones with generational sharing in 
the next paragraph and we do not discuss further incremental sharing separately. 

Generational Sharing. The notion of generational sharing refers to the possibility to 
avoid performing sharing on a part of the heap on which it was performed earlier In anal- 
ogy with generational garbage collection, there is a rationale for performing generational 
sharing: for generational garbage collection, the rationale is that new objects tend to die 
quickly. For generational sharing the rationale is that redoing sharing on old data (on which 
sharing was performed earlier) does not pay off. 

Our strategy to non-generational sharing is to recompute the cachedJiash and 
hashedjerms tables from scratch every time after a new garbage collection. With gen- 
erational sharing, one would like to reuse the part of the tables corresponding to the older 
generation. 

We reason about forward computation first: The information on terms in the older gen- 
eration that were ground at the previous run of the sharer and eligible for sharing at that 
moment is still valid. The same is generally not true for a non-ground term: it can now 
contain cells that are trailed, and in that case the information about the term is to be dis- 
carded from the table, or at least not used. Since it is not straightforward to keep track of 
which information in the tables is no longer valid because of this reason, it might be best 
to restrict a generational sharer to ground terms only. 

Now suppose that backtracking has taken place between two activations of the sharer: 
generally, this invalidates entries in the sharer tables because terms have disappeared. It 
is easy to adapt the cachedJiash table (it shrinks with the heap on backtracking), but the 
hashed_terms table also needs to be adapted. By keeping high and low water marks of the 
top of heap pointer, this can also be achieved. The cost of adapting the tables might be 
larger than the cost of rebuilding them however. 



10 Related Work 



( Appel and Gongalves 1 993 1 describes how hash-consing can be performed during garbage 
collection in an implementation of Standard ML (SML/NJ). The collector is generational, 
and the data structures in the old generation are hash-consed. In this way, the operation 
of hash-consing is restricted to data structures that are expected to live long. The reported 
performance and space gains are disappointing: half of the benchmarks lose performance 
(up to 25%) and the gain is maximally 10% (for boyer). The space improvement is even 
smaller: on most benchmarks less than 1%. Also for space, boyer is the exception with 
about 15%. Note however, that these space figures are about the amount of data copied to 
the older generation, i.e., the data that is hash-consed, and which is collected infrequently. 
As such, these numbers do not give full insight in the potential of hash-consing. Still, 
( Appel and Gongalves 1993| l is most closely related to our implementation of representa- 
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tion sharing for Prolog: our strategy is to perform representation sharing after a (major) 
garbage collection, so we introduce sharing only for data that just survived a collection. 

Mercury ( Somogyi et al. 1996| is basically a functional language, and the issue of trail- 
ing does not enter In the developers' mailing list in August 1999, the issue of hash consing 
was raised with a proposal for an implementation as well as how to present it to the user It 
is interesting that at some point, the opposite of our . - mutable declaration was proposed. 
As an example . - pragma hashjcons(foo/3) tells the compiler to hash-cons the constructors 
of type /oo/i. As far as we know, the proposals were not implemented. 

Last but not least, (Neu merkel 1989 ) provides the example blid/1, and gives a high-level 
outUne of an algorithm for minimization of heap terms seen as DFAs. Our implementa- 
tion can be seen as a concrete version of that algorithm. However, our minimization shows 
mostly similarities with (lErshov 19581 ) in which Ershov uses (for the first time in the pub- 
lished history of computer science) hashing to detect common subtrees in a given tree. 



11 Conclusion 

Without the questions by Ulrich Neumerkel on comp.lang.prolog, we would not have 
worked on this topic and we are grateful for his insistence that Prolog systems should 
have a sharer We have provided a practical and efficient implementation of representa- 
tion sharing, that can be incorporated without problems in most WAM based systems. Our 
implementation has the advantage that it does not rely on a particular garbage collection 
strategy or implementation. On the other hand, a tighter integration of the garbage collector 
with the representation sharing module can make the latter more efficient. Still, represen- 
tation sharing is not effective for all programs, so it must not be applied indiscriminately, 
i.e., it needs its own policy. We have also shown that input sharing for findall/3 is easy to 
implement. 
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Appendix: the relevant Part from the Program in fO'Keefe 2001^ 



tree_children {node (_, children) , Children) . 



up_down_star (A, D) :- 
( A = D 

; up_down_plus (A, D) 



top_pointer (Tree, ptr (Tree, [ ] , [ ] , no_ptr) ) . 



up_down(P, ptr (T, L, R, A) ) :- 
( var(P) -> 



up_down_plus (A, D) :- 
( var(A) -> 



A - ptr {_,_,_,_) , % not no_ptr, that is. 
P = A 
A = P, 

P = ptr (Tree, _,_,_) , 

tree_children (Tree, Children), 

% split Children++[] into reverse (L) ++ [T] ++R 

split_children (Children, [], L, T, R) 



) . 



up_down (X, D) , 
up_down_star (A, X) 
up_down (A, X) , 
up_down_star (X, D) 



) . 



mk_tree(D, node(D,C)) :- 
( D > -> 



split_children( [T |R] , L, L, T, R) . 
split_children( [XI S] , LO, L, T, R) :- 

split_children (S, [X|LO], L, T, R) , 



Dl is D - 1, 
C = [Tl, T2, T3, T4] , 
mk_tree (Dl, Tl) , 
m]5_tree (Dl, T2) , 
m]<_tree (01, T3) , 
m]<_tree(Dl, T4) 
C = [] 



fl (N) :- mk_tree (N, T) , 

top_pointer(T,P) , 

findalKQ, up_down_star (P, Q) , _L) . 



